134151 stories
·
0 followers

Telegramgate - Google Search

1 Share
Read the whole story
Michael_Novakhov
11 seconds ago
reply
http://michael_novakhov.newsblur.com/
Share this story
Delete

Telegramgate - Google Search

1 Share
Images may be subject to copyright. Learn More
Read the whole story
Michael_Novakhov
39 seconds ago
reply
http://michael_novakhov.newsblur.com/
Share this story
Delete

Telegramgate - Google Search

1 Share
Ricky Renuncia Puerto Rico RickyLeaks TelegramGate T-Shirt - <a href="http://Teefoody.com" rel="nofollow">Teefoody.com</a>
$19.00* · In stock · Brand: Ricky Renuncia Bandera Negra Puerto Rico
Gobernador Ricardo Rossello. Ricky Renuncia. #rickyleaks #Rosellorenuncia #rosello #renuncia #rickyrenuncia. Puerto Rican Governer must step down!
Read the whole story
Michael_Novakhov
1 minute ago
reply
http://michael_novakhov.newsblur.com/
Share this story
Delete

Telegramgate - Google Search

1 Share
Read the whole story
Michael_Novakhov
1 minute ago
reply
http://michael_novakhov.newsblur.com/
Share this story
Delete

Who is LuisG in Telegramgate? - Google Search

1 Share

Showing results for Who is Luis in Telegramgate?
Search instead for Who is LuisG in Telegramgate?

Search Results

Featured snippet from the web

Telegramgate, also known as Chatgate or RickyLeaks, was a political scandal involving Ricardo Rosselló, then Governor of Puerto Rico, which began on July 8, 2019, with the leaking of hundreds of pages of a group chat on the messaging application Telegram between Rosselló and past and current members of his staff.

Web results

Jul 9, 2019 - Later on, Rosselló quipped that he and “Luis G.” should take vacations so that Vázquez would become interim governor, just to see what she ...
Telegramgate, also known as Chatgate or RickyLeaks, was a political scandal involving Ricardo Rosselló, then Governor of Puerto Rico, which began on July 8, 2019, with the leaking of hundreds of pages of a group chat on the messaging application Telegram between Rosselló and past and current members of his staff.
Luis Rivera Marín. Luis Gerardo Rivera Marín is an attorney-at-law and notary and a former secretary of state of Puerto Rico.
Political party‎: ‎New Progressive
Born‎: ‎San Juan, Puerto Rico‎, U.S
Other political affiliations‎: ‎Republican
Jul 15, 2019 - ... been referred on social media as #TelegramGate and #RickyLeaks, ... of the governor and his administration,” the investigative reporters Luis ...

Read the whole story
Michael_Novakhov
4 minutes ago
reply
http://michael_novakhov.newsblur.com/
Share this story
Delete

Telegramgate Analysis in Python - Towards Data Science

1 Share

When working with text, it is important to be aware of the original document structure, to see what certain parts mean and their relevance to the document’s content.

To process the document, we first read it using Tika, and then we remove various unimportant parts of the text using regular expressions.

Read the PDF

First we read in the telegramgate PDF document (which you can download here) using Tika, and we preview the first 200 characters.

Clean the PDF

We notice that the document has a lot of new lines, so we remove them using the string method .strip(), and now the first 399 characters of pdf_content look like this:

[Notice that the line ‘lComo se col6 eso?’ was misread from ‘¿Como se coló eso?’. Tika read some upside down punctuation marks and accented letters incorrectly, for example, it read ó as 6 and í as f. We do not propose a fix for this in this article.]

The date “1/20/2019” and the text “Telegram Web” appear at the beginning of every page, which most likely implies that the Telegram conversation was downloaded on that date, January 20 of 2019. At the end of every page there is a telegram URL and the page you are on out of all pages in the PDF. We can use the page number for keeping track of the page in the PDF for which a message appears, so we should keep that. But the date and text “Telegram Web” at the top and the link at the bottom of every page should be removed, it is not useful to the analysis.

A good place to test your regular expressions is at https://regex101.com/.

There are various time stamps which do not align with any messages.
We will assume that the time in which a message was sent is unimportant to the analysis, and thus will be removed.

When studying pdf_content, you may notice that there are a lot of two letter acronyms. From viewing the PDF, you can tell that a lot of these 2-letter acronyms correspond to the chat members which do not have profile pictures. These acronyms are not useful to the analysis of the document, but we should not attempt to remove them until we know who are the members who participated in the chat. We will remove them in the next section titled “Organize Content”, where we determine who are the admin chat members.

Create a list from the PDF’s content

After these steps we could not identify any more strings to remove from the plain text, so we create a list from the pdf_content by separating the string by the new lines, and we start extracting information.

Now we are ready to organize the data.

In order to organize/structure the information from the PDF, we decide to identify the chat members and which messages were sent per member.

We could use text formatting to automatically identify all members present in the chat, but we were not able to read in the PDF document with formatting included (we read it in using the Tika CLI and the tags xml and html but neither kept the original formatting of the PDF).

So we will have to suffice with identifying the admins automatically and identifying the other members manually (feel free to leave a comment on ways we could automatically retrieve non-admin chat members from the PDF).

Identify admin chat members

When reviewing the PDF’s plain text, we noticed that the lines containing an admin user’s name contain the term ‘admin’. We isolate the lines which contain this term, in order to identify the admin users in the chat.

Now we clean these results by removing the text ‘ admin’ and the text that follows it.

Some admin users are repeated, differing only by the text ‘via@gif’. We remove the elements which contain this text, since those elements are redundant.

When examining the resulting list, you can tell that there are two elements which are most likely typos (user names misread by Tika). These elements are ‘Fdo’ and ‘R Russello’. We remove these elements and have our final list of admin chat members.

The misread usernames are still present in the document. We replace them with the correct spelling.

There is a total of 8 admin members in the chat group. The admin members are the following:

  • Ch Sobri (Christian Sobrino): He was the executive director of the Puerto Rico Fiscal Agency and Financial Advisory Authority (AAFAF). He quit from his various roles in the puerto rican goverment on July 13, 2019, after the chat messages were leaked (source).
  • Carlos Bermudez: He was the governor’s communication’s adviser. He quit on July 13, 2019. He was the first member of the chat group to quit after the chat was leaked (source).
  • Rafael Cerame D’Acosta: Also a communication’s adviser for the governor. He no longer maintains any contracts with the government (source).
  • Edwin Miranda: CEO at KOI. Publicist and close collaborator of the governor.
  • F do (Elías Sánchez Sifonte): Ex-representative of the governor in the Financial Oversight and Management Board.
  • Ramon Rosario: Ex-secretary of public affairs and public policy (from January 2017 to December 2018). He was not working for the government in the time frame of the leaked messages.
  • R Rossello (Ricardo Rosello): Governor of Puerto Rico from January 2, 2017 to present.
  • Alfonso Orona: Prior legal adviser (chief legal officer of La Fortaleza from January 2017 to February 2019).

Identify non-admin chat members

According to the first line of the PDF, which says ‘WRF 12 members’, there is a total of 12 members in the chat group. This means that there are 4 non-admin chat members in the group. After going through the PDF, we found that the remaining 4 chat members are:

  • Raul Maldonado
  • Anthony O. Maceira Zayas
  • Ricardo Llerandi
  • LuisG

Unfortunately we were not able to extract this information within the Python script. We will be adding these usernames together with the admin usernames into a consolidated list of total chat members.

It’s not cheating …

Remove 2-letter username acronyms

Now that we know who the chat members are, we can remove the 2-letter acronyms associated to some of the chat members who don’t have profile pictures. We build all potential acronyms for the users.

Now that we have the potential acronyms, we search which elements in pdf_lines contain only one of these acronyms and we remove those lines.

Organize

Now to the meat!

We create a list of dictionaries (which we’ll call conversation)in which each dictionary element has the following keys:

  • message: Message sent in chat.
  • chat_member: Chat member who sent the message.
  • date: Date (weekday, month, day, year) in which the message was sent.
  • page_number: Page within the PDF in which the message appears.

We use the lines containing the chat member usernames as indicators of who sent which message. These lines are not be included within the final list, since they are not part of the conversation.

With the data now cleaned up and organized, we can create some visualizations to showcase data insights.

We decided to create a horizontal bar chart where each bar corresponds to a chat group member and the length of the bar represents the number of messages sent by a chat member.

We chose to color each bar differently, so we randomly generate colors and assign a different color to each chat member.

If you are running this code and you do not like the generated colors, feel free to rerun the code and create different colors, or manually define the colors you want per user.

We get the number of messages per chat member and store them into a dictionary where the keys are the usernames and the values are the message counts.

We used matplotlib to draw the bar chart.

The following is the resulting graph:

From the diagram we can see that Edwin Miranda sent the most messages. This may be because he shared many news articles and commented on them in the chat group, as can be seen from the PDF.

We decided to create an abridged telegramgate PDF document, to see how much shorter the document can be made and if the chat’s content can be shown in a clearer, more straightforward format. The resulting PDF would not include images or GIFs, and it would not differentiate shared posts from written posts.

In order to create the PDF, we first create a string in HTML format using the content in the variable conversation. We write the page number, the username in the colors generated for the visualizations in the previous section, and the message sent by the user. We did not write the date the message was sent. After creating the HTML string, we save it into a file on our local folder, which we name “telegramgate_abridged.html”. Then we use the PDFkit Python library to create the PDF file from the HTML file, and we save the result to “telegramgate_abridged.pdf”.

The resulting PDF is 654 pages long, while the original document is 899 pages. This means the new PDF has 245 fewer pages than the original.

We chose to color-code the usernames in order to easily differentiate who wrote the messages. When the usernames are of the same length and color, they can visually blend together. If you wish to create the PDF without colored usernames, you can do so by removing the font tags added within the for loop.

You can access the PDF generated from this script here.

Read the whole story
Michael_Novakhov
14 minutes ago
reply
http://michael_novakhov.newsblur.com/
Share this story
Delete
Next Page of Stories